Expected Eligibility Traces

نویسندگان

چکیده

The question of how to determine which states and actions are responsible for a certain outcome is known as the credit assignment problem remains central research in reinforcement learning artificial intelligence. Eligibility traces enable efficient recent sequence experienced by agent, but not counterfactual sequences that could also have led current state. In this work, we introduce expected eligibility traces. Expected allow, with single update, update preceded state, even if they did do so on occasion. We discuss when provide benefits over classic (instantaneous) temporal-difference learning, show some- times substantial improvements can be attained. way smoothly interpolate between instantaneous mechanism similar bootstrapping, ensures resulting algorithm strict generalisation TD(?). Finally, possible extensions connections related ideas, such successor features.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bidding Strategy on Demand Side Using Eligibility Traces Algorithm

Restructuring in the power industry is followed by splitting different parts and creating a competition between purchasing and selling sections. As a consequence, through an active participation in the energy market, the service provider companies and large consumers create a context for overcoming the problems resulted from lack of demand side participation in the market. The most prominent ch...

متن کامل

Eligibility Traces for Off-Policy Policy Evaluation

Eligibility traces have been shown to speed reinforcement learning, to make it more robust to hidden states, and to provide a link between Monte Carlo and temporal-difference methods. Here we generalize eligibility traces to off-policy learning, in which one learns about a policy different from the policy that generates the data. Off-policy methods can greatly multiply learning, as many policie...

متن کامل

Recursive Least-Squares Learning with Eligibility Traces

In the framework of Markov Decision Processes, we consider the problem of learning a linear approximation of the value function of some fixed policy from one trajectory possibly generated by some other policy. We describe a systematic approach for adapting on-policy learning least squares algorithms of the literature (LSTD [5], LSPE [15], FPKF [7] and GPTD [8]/KTD [10]) to off-policy learning w...

متن کامل

iLSTD: Eligibility Traces and Convergence Analysis

We present new theoretical and empirical results with the iLSTD algorithm for policy evaluation in reinforcement learning with linear function approximation. iLSTD is an incremental method for achieving results similar to LSTD, the dataefficient, least-squares version of temporal difference learning, without incurring the full cost of the LSTD computation. LSTD is O(n), where n is the number of...

متن کامل

Evidence for eligibility traces in human learning

Whether we prepare a coffee or navigate to a shop: in many tasks we make multiple decisions before reaching a goal. Learning such state-action sequences from sparse reward raises the problem of creditassignment: which actions out of a long sequence should be reinforced? One solution provided by reinforcement learning (RL) theory is the eligibility trace (ET); a decaying memory of the state-acti...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2021

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v35i11.17200